System and method for biological data analysis using a bayesian network combined with a support vector machine

ABSTRACT

A method for analyzing biological data includes classifying a first set of biological data in a first classifier, classifying a second set of biological data in a second classifier, combining the results of the first classifier with the results of the second classifier, and analyzing the results as a function of the similarity measure of the first classifier and the similarity measure of the second classifier.

CROSS REFERENCE TO RELATED UNITED STATES APPLICATIONS

This application claims priority from “A combination of Bayesiannetworks and an improved support vector machines for the analysis ofbiological data”, U.S. Provisional Application No. 60/604,233 of Cheng,et al., filed Aug. 25, 2004, the contents of which are incorporatedherein by reference.

TECHNICAL FIELD

This invention is directed to the analysis of biological data usinglearning tools such as Bayesian networks and support vector machines(SVMs).

DISCUSSION OF THE RELATED ART

One area of concern in bioinformatics is the discovery of dependenciese.g., in genetic networks based on microarray data and their change froma healthy state due to illness. These data have inherent noise forbiological and technical reasons and require advanced technologies toextract useful information for the subsequent classification.

Two methods for extracting these dependencies are Bayesian networks andsupport vector machines. Bayesian networks (BN) are powerful tools forknowledge representation and inference under conditions of uncertainty.A Bayesian network B=[N, A, Θ] is a directed acyclic graph (DAG) whereeach node nεN represents a domain variable, and each edge aεA betweennodes represents a probabilistic dependency, quantified using aconditional probability distribution θ_(i)εΘ for each node n_(i). ABayesian network (BN) can be used to compute the conditional probabilityof one node, given values assigned to the other nodes; hence, a BN canbe used as a classifier that gives the posterior probabilitydistribution of the node class given the values of other attributes. Anadvantage of BNs over other types of predictive models, such as neuralnetworks, is that the Bayesian network structure represents theinter-relationships between the dataset attributes. Human experts caneasily understand the network structures and if necessary modify them toobtain better predictive models.

Support vector machines (SVMs) are techniques that have been developedfor statistical pattern recognition, and have been applied to manypattern recognition areas. SVMs are primarily two-class classifiers thathave a margin between the two classes, and have training patterns calledsupport vectors that define the classification function. SVMs have beenproven to be powerful classification tools that exhibit goodgeneralization. This can be attributed to the fact that the regulationterm in an SVM not only overcomes the over-training problem, whichtypical neural networks have, but also maximizes the separation betweenclasses. However, an SVM does not reject data that does not meet theclassification criteria. Use of a decision threshold can make the SVMreject data, but such SVMs have poor rejection performance, because theSVM produces a large decision region for each class, leading to highfalse alarm rates.

SUMMARY OF THE INVENTION

Exemplary embodiments of the invention as described herein generallyinclude methods and systems for combining a Bayesian network with animproved SVM for the analysis of biological data. A new support vectorrepresentation and discrimination machine has comparable discriminationperformance as the SVM, but much better rejection performance, while anew BN learning algorithm is based on a three-phase dependency analysis,which is especially suitable for data mining in high dimensional datasets due to its efficiency. The performance of the SVMs was improved byaddressing rejection-classification, where there are M object classes tobe discriminated and one non-object class to be rejected. Thisnon-object class could be anything except the M object classes.

According to an aspect of the invention, there is provided a method foranalyzing biological data, the method including classifying a first setof biological data in a first classifier, classifying a second set ofbiological data in a second classifier, combining the results of thefirst classifier with the results of the second classifier, andanalyzing the results as a function of the similarity measure of thefirst classifier and the similarity measure of the second classifier.

According to a further aspect of the invention, the first set ofbiological data and the second set of biological data are the same.

According to a further aspect of the invention, the first classifier isa support vector representation and discrimination machine.

According to a further aspect of the invention, the second classifier isa Bayesian network.

According to a further aspect of the invention, the first set ofbiological data is a set of microarray data.

According to a further aspect of the invention, the second set ofbiological data is a set of protein mass spectra.

According to a further aspect of the invention, the results of the firstclassifier and the second classifier are combined in parallel.

According to a further aspect of the invention, the Bayesian networkcomprises computing mutual information of pairs of data of said dataset, creating a draft network based on the mutual information, whereindata item of said data set comprise nodes of said network and the edgesconnecting a pair of data nodes represent the mutual information of saidnodes, thickening said network by adding edges when pairs of data nodescannot be d-separated, and thinning said network by analyzing each edgeof said draft network with a conditional independent test and removingsaid edge if said corresponding data nodes can be d-separated.

According to a further aspect of the invention, the combining stepcomprises weighing the results of the first and second classifiers basedon the input patterns.

According to another aspect of the invention, there is provided aprogram storage device readable by a computer, tangibly embodying aprogram of instructions executable by the computer to perform the methodsteps for analyzing biological data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a schematic diagram of the combination of the twoclassifiers according to an embodiment of the invention.

FIG. 2 illustrates the decision uncertainty for two exemplaryprobability distribution functions according to an embodiment of theinvention.

FIG. 3 illustrates a combination of two classifier distributions for twodifferent classes according to an embodiment of the invention.

FIG. 4 depicts a simple multi-connected network, according to anembodiment of the invention.

FIG. 5 is a block diagram of an exemplary computer system forimplementing a combined BN and SVM according to an embodiment of theinvention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Exemplary embodiments of the invention as described herein generallyinclude systems and methods for combining two classifiers for bothanalyzing the same type of data as well as analyzing data from differentsources having different biomolecular information.

According to an embodiment of the invention, a first combination of asupport vector representation and discrimination machine (SVRDM) andBayesian network (BN) are utilized for analyzing the same set of microarray data. A second combination of the SVRDM and BN are utilized foranalyzing the same set of protein mass spectra, while in a thirdcombination, a SVRDM is utilized to analyze protein mass spectra and aBN is utilized to analyze micro array data, or vice versa.

Before describing a SVDRM, a support vector representation machine(SVRM) will be described. Consider two classes, where C₁ is the objectclass, and C₀ is the nonobject class. The task of one-classclassification is to find the decision region R₁ for C₁ such that if aninput xεR₁, x is assigned to C₁; otherwise, it is rejected as C₀.Suppose there are N training vectors {x₁, . . . , x_(N)} from C₁, and notraining vectors from C₀. The training task is to find an evaluationfunction ƒ₁(x), which gives the confidence of the input x being in theobject class. The region R₁ is defined as R₁={x: ƒ₁(x)≧T} to containthose object samples x giving evaluation-function values above somethreshold T. To achieve a high recognition rate, training vectors shouldproduce high evaluation-function values.

A mapping from the input space to a high-dimensional feature space isdefined as Φ: RτF, where R is the input space and F is the transformedfeature space. The explicit form of Φ and calculation of Φ(x) are notnecessary. Rather, only the inner product Φ(x_(i))^(T)Φ(x_(j)) need bespecified to be some kernel function. To evaluate Φ^(T)Φ, evaluate theassociated kernel function. According to an embodiment of the invention,a Gaussian kernel exp(−|x_(i)−x_(j)|²/2σ²) is used, since it simplifiesvolume estimation and has other desirable properties. For a Gaussiankernel, the transformed training and test vectors lie on the unit spherecentered at the origin in F. Since the data are automatically normalizedto be of unit length, the distance between two vectors in F can berepresented by their inner product. Thus, as an evaluation function theinner product ƒ₁(x)=h^(T)Φ(x) can be used, where h is a vector in F thatwe compute from the training set. It describes the SVRM and is used todetermine the class of test inputs.

The solution h for the SVRM satisfiesMin|h|²/2h ^(T)Φ(x _(i))≧T=1, i=1, . . . , N.The second condition above ensures large evaluation function values forthe training set, greater than some threshold T, which is preferablyequal to 1. The norm |h| of h is minimized in the first condition toreduce the volume of R₁ to provide rejection of nonobjects. It can beshown that a solution h with a lower norm provides a smaller class-C₁acceptance volume. Outliers (errors) are expected, however, and thesecond constraint above will not be satisfied for all of the trainingset. Thus, slack variables ξ_(i) are introduced, and h satisfies${{Min}\left\{ {\frac{{h}^{2}}{2} + {C{\sum\limits_{i = 1}^{N}\xi_{i}}}} \right\}},{{{h^{T}{\Phi\left( x_{i} \right)}} \geq T} = {1 - \xi_{i}}},{i = 1},\ldots\quad,N,{\xi_{i} \geq 0},\quad{i = 1},\ldots\quad,N,$

This allows for classification errors by amounts ξ_(i) for varioustraining-set samples x_(i). The factor C in the first condition is theweight of the penalty term for the slack variables. The solution h is alinear combination of the support vectors, which are a small portion ofthe entire training set. To classify an input x, form the inner producth^(T)Φ(x); if this is at or above some threshold T, classify x as amember of the object class. In many circumstances, the training set isnot adequate to represent the test set. Thus, in practice, a thresholdT<1 is used in the above equations, and a decision region that is largerthan that occupied by only the training data is used.

A SVRDM classifier is a SVRM extended to the multiple object class case.Consider K object classes with N_(k) training samples per class; thetraining vectors for class k are {x_(ki)}. To consider classificationand rejection, define P_(C) as the classification rate, which is thepercentage of the object class samples that are classified in thecorrect object class, and define P_(R) as the rejection rate, which isthe rate of object-class samples rejected as the nonobject class. P_(E)is defined as the classification error rate, which is the rate ofobject-class samples classified in the wrong object classes. Thus,P_(C)+P_(R)+P_(E)=1. P_(FA) is the percentage of the nonobject-classsamples mistakenly classified as being in an object class (i.e., falsealarms). The objective is to obtain a high P_(C) and a low P_(FA). Theclassifier approach is to obtain K functions h_(k); each discriminatesone of the K classes {k} from the other K−1 classes. For a given testinput x, calculate the vector inner product (VIP) of Φ(x) with eachh_(k). If any of these kernel VIPs are μT, x is assigned to the classproducing the maximum VIP value; otherwise it is rejected. It is assumedthat there are no nonobject-class samples in the training set. Forsimplicity, consider first a two-object-class problem. For class 1samples x_(1i), the evaluation-function VIP output is h₁ ^(T)Φ(x_(1i))≧Tand h₂ ^(T)Φ(x_(1i))≦p. For class 2 samples x_(2j), the output is h₂^(T)Φ(x_(2j))≧T and h₁ ^(T)Φ(x_(2j))≦p. The parameter p is the maximumevaluation-function value that can be accepted for the otherobject-class samples. The two solution vectors h₁ and h₂ thus satisfyMin|h₁|²/2h ₁ ^(T)Φ(x _(1i))≧1 i=1, . . . ,N₁,h ₁ ^(T)Φ(x _(2j))≦p j=1, . . . ,N₂,andMin|h₂|²/2h ₂ ^(T)Φ(x _(2i))≧1 i=1, . . . ,N₁,h ₂ ^(T)Φ(x _(1j))≦p j=1, . . . ,N₂,

Note that the VIP kernel-function value for the object class to bediscriminated against is specified to be p in this case. The differencein the formulation of the SVRM and the SVRDM lies in the third conditionabove; this condition provides discrimination information between objectclasses by using p>−1 and p<−1 (the SVM solution is p=−1) and rejectionof nonobjects. In the presence of outliers (training-class errors),slack variables ξ_(i) are of course used in both h₁ and h₂. The finalversion for h₁ is thus${{Min}\left\{ {\frac{{h_{1}}^{2}}{2} + {C\left( {{\sum\xi_{1i}} + {\sum\xi_{2j}}} \right)}} \right\}},\begin{matrix}{{{h_{1}^{T}{\Phi\left( x_{1i} \right)}} \geq {1 - \xi_{1i}}},} & {\quad{{i = 1},\ldots\quad,N_{1},}} \\{{{h_{1}^{T}{\Phi\left( x_{2j} \right)}} \leq {p + \xi_{2j}}},} & {{j = 1},\ldots\quad,N_{2},}\end{matrix}$ ξ_(1i) ≥ 0,  ξ_(2j) ≥ 0,and h₂ is similar.

For a K-class problem, an SVRDM contains K functions h_(k), similar toh₁ in the SVRM case. Each recognizes one of the K classes (training-setsamples x_(ki)) with a vector inner product μ1 and all othertraining-set samples in the other K−1 classes (training-set samplesx_(mj), where mγk) with a vector inner product [p. For a test input x,if the maximum of the transformed vector inner products for all of the Kfunctions h_(k) is μT, the test sample is placed in that class thatproduces the maximum vector inner product; otherwise, it is rejected asa non-object.

A Bayesian network (BN) is a probabilistic graphical model in which thenodes represent random variables, and the edges represent conditionalindependence assumptions. In addition, a conditional probabilitydistribution (CPD) is associated with each node. A BN is also a directedgraph, in which the direction of an edge is indicative of adeterministic relationship between the nodes. Then, the CPD indicatesthe probability that a child node takes on each of its different valuesfor each combination of parent nodes. Bayesian networks allow one todetermine chains of probabilistic inference, including determiningcausation and explaining away.

According to an embodiment of the invention, an efficient algorithm forconstructing Bayesian belief networks from databases takes a databaseand an attributes ordering (i.e., the causal attributes of an attributeshould appear earlier in the order) as input and constructs a beliefnetwork structure as output. A belief network can be viewed as a networksystem of information channels, where each node is a valve that iseither active or inactive and the valves are connected by noisyinformation channels (edges). The information flow can pass through anactive valve but not an inactive one. When all the valves (nodes) on oneundirected path between two nodes are active, this path is said to beopen. If any one valve in the path is inactive, the path is said to beclosed. When all paths between two nodes are closed given the status ofa set of valves (nodes), the two nodes are said to be d-separated by theset of nodes. The status of valves can be changed through theinstantiation of a set of nodes. The amount of information flow betweentwo nodes can be measured by using mutual information, when no nodes areinstantiated, or conditional mutual information, when some other nodesare instantiated.

In information theory, the mutual information of two nodes X_(i), X_(j),is defined as${{I\left( {X_{i},X_{j}} \right)} = {\sum\limits_{x_{i},x_{j}}{{P\left( {x_{i},x_{j}} \right)}\log\frac{P\left( {x_{i},x_{j}} \right)}{{P\left( x_{i} \right)}{P\left( x_{j} \right)}}}}},$and conditional mutual information is defined as${{I\left( {X_{i},{X_{j}❘C}} \right)} = {\sum\limits_{x_{i},x_{j},c}{{P\left( {x_{i},x_{j},c} \right)}\log\frac{P\left( {x_{i},{x_{j}❘c}} \right)}{{P\left( {x_{i}❘c} \right)}{P\left( {x_{j}❘c} \right)}}}}},$where X_(i), X_(j) are two nodes and C is a set of nodes. According toan embodiment of the invention, conditional mutual information is usedas a conditional independence test to measure the average informationbetween two nodes when the statuses of some valves are changed by thecondition-set C. When I(X_(i), X_(j)|C) is smaller than a certainthreshold value ε, X_(i), X_(j) are said to be d-separated by thecondition-set C, and they are conditionally independent. This algorithmalso makes the following assumptions: (1) The database attributes havediscrete values and there are no missing values in all the records; (2)The volume of data is large enough for reliable conditional independencetests; and (3) The ordering of the attributes is available before thenetwork construction, i.e., a node's parents nodes should appear earlierin the order.

According to an embodiment of the invention, this algorithm has threephases: drafting, thickening and thinning. In the first phase, thisalgorithm computes mutual information of each pair of nodes as a measureof closeness, and creates a draft based on this information. In thesecond phase, the algorithm adds edges when the pairs of nodes cannot bed-separated. The result of the second phase is an independence map ofthe underlying dependency model. In the third phase, each edge of theindependence map is examined using conditional independence tests andwill be removed if the two nodes of the edge can be d-separated.

According to an embodiment of the invention, the drafting phase can besummarized as follows.

1. Initiate a graph G(V E) where V={all the nodes of a data set} and E={}. Initiate two empty ordered sets S, R.

2. For each pair of nodes (ν_(i), ν_(j)) where ν_(i),ν_(j)εV, computethe mutual information I(ν_(i), ν_(j)). For those pairs of nodes thathave mutual information greater than a certain small value ε, sort themby their mutual information from large to small and put them into anordered set S.

3. Remove the first two pairs of nodes from S. Add the correspondingedges to E (the direction of the edges in this algorithm is determinedby the previously available node ordering).

4. Remove the first remaining pair of nodes from S. If there is no openpath between the two nodes (these two nodes are d-separated given emptyset), add the corresponding edge to E; otherwise, add the pair of nodesto the end of an ordered set R.

5. Repeat step 4 until S is empty.

For the purpose of illustrating this algorithm's working mechanism,consider a database that has underlying Bayesian network illustrated inFIG. 4 a; and also order the nodes as A, B, C, D, E. After step 2, onecan get the mutual information of all 10 pair of nodes. SupposeI(B,D)μI(C,E) μI(B,E)μI(A,B)μI(B,C)μI(C,D)μI(D,E)μI(A,D)μI(A,E)μI(A,C),and all the mutual information is greater than ε, one can construct adraft graph as shown in FIG. 4 b after step 5. Note that the order ofmutual information between nodes is not arbitrary. For example, frominformation theory, I(A,C)<Min(I(A,B),I(B,C)). When the underlying graphis sparse, Phase I can construct a graph very close to the original one.If the underlying graph is a singly connected graph (a graph without anundirected cycle), Phase I guarantees the constructed network is thesame as the original one. In this example, (B,E) is wrongly added and(D,E) is missing because of the existing open path (D-B-E) and(D-B-C-E). The draft graph created in this phase is the base for nextphase.

According to an embodiment of the invention, the thickening phase can besummarized as follows.

6. Remove the first pair of nodes from R.

7. Find a block set that blocks each open path between these two nodesby a set of minimum number of nodes. Conduct a conditional independencetest. If these two nodes are still dependent on each other given theblock set, connect them by an edge.

8. go to step 6 until R is empty.

The graph after Phase II is shown in FIG. 4 c. When this algorithmexamines the pair of nodes (D,E) in step 7, it finds that {B} is theminimum set which blocks all the open paths between D and E. Since theconditional independence test can reveal that D and E are stilldependent given {B}, edge (D,E) is added. Edge (A,C) is not addedbecause the conditional independence test reveals that A and C areindependent given block set {B}. Edge (A,D), (C,D) and (A,E) are notadded for the same reason. In this phase, the algorithm examines allpairs of nodes that have mutual information greater than ε, an edge isnot added when the two nodes are independent given some block set. It ispossible that some edges are wrongly added in this phase.

According to an embodiment of the invention, the thinning phase can besummarized as follows.

9. For each edge in E, if there are open paths between the two nodesbesides this edge, remove this edge from E temporarily and find a blockset that blocks each open path between these two nodes by a set ofminimum number of nodes. Conduct a conditional independence test on thecondition of the block set. If the two nodes are dependent, add thisedge back to E; otherwise remove the edge permanently. The ‘thinned’graph is shown in FIG. 4 d, which is the same as the original graph.Edge (B,E) is removed because B and E are independent given {C,D}.

According to an embodiment of the invention, an algorithm for finding ablock set that blocks each open path between these two nodes by a set ofminimum number of nodes is as follows. Because this procedure uses agreedy search method, it does not guarantee that a minimum block set isfound.

-   Procedure find_block_set (current graph, node1, node2)-   begin-   find all the undirected paths between node1 and node2;-   store the open paths in open_path_set, store the closed paths in    closed_path_set;-   do    -   while there are open paths which have only one node do        -   store the nodes of each such path in the block set;        -   remove all the blocked paths by these nodes from the            open_path_set and closed_path_set;        -   from the closed_path_set, find paths opened by the nodes in            block set and move them to the open_path_set, shorten such            paths by removing the nodes that are also in the block set;    -   end while    -   if there are open paths do        -   find a node which can block maximum number of the rest paths            and put it in the block set;        -   remove all the blocked paths by the node from the            open_path_set and the closed_path_set;        -   from the closed_path_set, find paths opened by this node and            move them to the open_path_set, shorten such paths by            removing the nodes that are also in the block set;    -   end if-   until there are no open path-   end procedure.

According to an embodiment of the invention, a schematic diagram of thecombination of the two classifiers, a support vector representation anddiscrimination machines (SVRDM) and a Bayesian network (BN), isillustrated in FIG. 1. Biological data 10 is input to both classifiers11, 12, in parallel. A combiner 14 uses the intermediate results 13 ofboth classifiers at the same time. The result 15 is a merged decision,which is itself a classification. Combining classifiers allows one tocombine classifiers trained on different feature sets, differenttraining sets, different classification methods, or different trainingsessions, to improve overall classification accuracy.

A single classifier usually has a particular uncertainty in itsdecision, which can be described by a probability density function foreach class. FIG. 2 illustrates the decision uncertainty for twoexemplary probability distribution functions (PDFs). The graph 20 showsthe PDFs plotted as a function of similarity measure. The PDFs of aclassifier for two different classes are illustrated and are assumed tohave a bell shape. A PDF for class A is represented by curve 21, while aPDF for class B is represented by curve 22. As can be seen from thefigure, the two bell curves may overlap, and the area of the overlap isa measure of the quality of the classification algorithm. Classificationresults which are between the two bell curves would have highuncertainty, whereas classification results far away from the middlewould have a low uncertainty.

According to an embodiment of the invention, a classifier uses acombination of two classifiers that are not strongly correlated to eachother, where if a given instance is classified by one classifier with ahigh uncertainty, the other can give a classification with a lowuncertainty, and vice versa. In that situation, a combined decision canbe generated with a lower uncertainty. FIG. 3 illustrates a combinationof two classifier distributions for two different classes. The graph 30has the similarity measure of the first classifier plotted along thehorizontal axis, while the similarity measure of the second classifierplotted along the vertical axis, with the shapes of the correspondingPDFs sketched along the corresponding axis. The regions of thesimilarity space where the corresponding PDFs are at a maximum areindicated by the ellipses 31, 32. Ellipse 31 indicates the region wherethe PDF for class A is at a maximum, while ellipse 32 indicates theregion where the PDF for class B is at a maximum. The raw data can beidentical for both classifiers or the data can differ but describe thesame classes.

By combining the intermediate results of both classifiers, even if thereis a large overlap in the class PDFs for each classifier individually,the area of overlap of the combined PDFs should decrease in order toprovide improved classification performance. Note that, in accordancewith an embodiment of the invention, the combiner is a classifieritself, which has as input the classification results of the BN andSVRDM with their uncertainties. The combiner then divides the decisionareas in two parts, which correspond with the two different classes.

The type of classifier incorporated in the combiner depends on thedistribution of the feature vector data as well as on the distributionof the classification results of BN and SVRDM. Some combiners areadaptive in that the combiner weighs the decisions of individualclassifiers depending on the input patterns. Adaptive combinationschemes can also exploit the detailed error characteristics andexpertise of the individual classifiers. In addition, differentcombiners expect different types of output from the individualclassifiers. These expectations can be categorized into three groups:(1) measurement (or confidence); (2) rank; and (3) abstract. At theconfidence level, the individual classifier outputs a numerical valuefor each class indicating the probability that the given input patternbelongs to that class. At the rank level, the classifier assigns a rankto each class with the highest rank being the first choice. At theabstract level, a classifier only outputs a unique class label (orseveral class labels where the classes are equally probable). Theconfidence level imparts the most information, while the abstract levelimparts the least information about the decision being made.

The SVRDM has been applied to a ship and face databases and has beenproven superior to the SVM in terms of rejection and classification. Theempirical results on a set of standard benchmark datasets show thatBayesian networks are excellent classifiers.

Combination classifiers in accordance with an embodiment of theinvention have application for the combined analysis of protein and geneexpression data for healthy persons and patients of certain illnesses,such as lung cancer. Models for each data set and for each classifiercan be built and the combination will then give a combined model, whichallows a mapping of genotype information to the phenotype information.

It is to be understood that the present invention can be implemented invarious forms of hardware, software, firmware, special purposeprocesses, or a combination thereof. In one embodiment, the presentinvention can be implemented in software as an application programtangible embodied on a computer readable program storage device. Theapplication program can be uploaded to, and executed by, a machinecomprising any suitable architecture.

FIG. 5 is a block diagram of an exemplary computer system forimplementing a combined BN and SVM according to an embodiment of theinvention. Referring now to FIG. 5, a computer system 51 forimplementing the present invention can comprise, inter alia, a centralprocessing unit (CPU) 52, a memory 53 and an input/output (I/O)interface 54. The computer system 51 is generally coupled through theI/O interface 54 to a display 55 and various input devices 56 such as amouse and a keyboard. The support circuits can include circuits such ascache, power supplies, clock circuits, and a communication bus. Thememory 53 can include random access memory (RAM), read only memory(ROM), disk drive, tape drive, etc., or a combinations thereof. Thepresent invention can be implemented as a routine 57 that is stored inmemory 53 and executed by the CPU 52 to process the signal from thesignal source 58. As such, the computer system 51 is a general purposecomputer system that becomes a specific purpose computer system whenexecuting the routine 57 of the present invention.

The computer system 51 also includes an operating system and microinstruction code. The various processes and functions described hereincan either be part of the micro instruction code or part of theapplication program (or combination thereof) which is executed via theoperating system. In addition, various other peripheral devices can beconnected to the computer platform such as an additional data storagedevice and a printing device.

It is to be further understood that, because some of the constituentsystem components and method steps depicted in the accompanying figurescan be implemented in software, the actual connections between thesystems components (or the process steps) may differ depending upon themanner in which the present invention is programmed. Given the teachingsof the present invention provided herein, one of ordinary skill in therelated art will be able to contemplate these and similarimplementations or configurations of the present invention.

The particular embodiments disclosed above are illustrative only, as theinvention may be modified and practiced in different but equivalentmanners apparent to those skilled in the art having the benefit of theteachings herein. Furthermore, no limitations are intended to thedetails of construction or design herein shown, other than as describedin the claims below. It is therefore evident that the particularembodiments disclosed above may be altered or modified and all suchvariations are considered within the scope and spirit of the invention.Accordingly, the protection sought herein is as set forth in the claimsbelow.

1. A method for analyzing biological data comprising the steps of:classifying a first set of biological data in a first classifier;classifying a second set of biological data in a second classifier;combining the results of the first classifier with the results of thesecond classifier; and analyzing the results as a function of thesimilarity measure of the first classifier and the similarity measure ofthe second classifier.
 2. The method of claim 1, wherein the first setof biological data and the second set of biological data are the same.3. The method of claim 1, wherein the first classifier is a supportvector representation and discrimination machine.
 4. The method of claim1, wherein the second classifier is a Bayesian network.
 5. The method ofclaim 1, wherein the first set of biological data is a set of microarraydata.
 6. The method of claim 1, wherein the second set of biologicaldata is a set of protein mass spectra.
 7. The method of claim 1, whereinthe results of the first classifier and the second classifier arecombined in parallel.
 8. The method of claim 4, wherein said Bayesiannetwork comprises computing mutual information of pairs of data of saiddata set, creating a draft network based on the mutual information,wherein data item of said data set comprise nodes of said network andthe edges connecting a pair of data nodes represent the mutualinformation of said nodes, thickening said network by adding edges whenpairs of data nodes cannot be d-separated, and thinning said network byanalyzing each edge of said draft network with a conditional independenttest and removing said edge if said corresponding data nodes can bed-separated.
 9. The method of claim 1, wherein said combining stepcomprises weighing the results of the first and second classifiers basedon the input patterns.
 10. A program storage device readable by acomputer, tangibly embodying a program of instructions executable by thecomputer to perform the method steps for analyzing biological data, saidmethod comprising the steps of: classifying a first set of biologicaldata in a first classifier; classifying a second set of biological datain a second classifier; combining the results of the first classifierwith the results of the second classifier, and analyzing the results asa function of the similarity measure of the first classifier and thesimilarity measure of the second classifier.
 11. The computer readableprogram storage device of claim 10, wherein the first set of biologicaldata and the second set of biological data are the same.
 12. Thecomputer readable program storage device of claim 10, wherein the firstclassifier is a support vector representation and discriminationmachine.
 13. The computer readable program storage device of claim 10,wherein the second classifier is a Bayesian network.
 14. The computerreadable program storage device of claim 10, wherein the first set ofbiological data is a set of microarray data.
 15. The computer readableprogram storage device of claim 10, wherein the second set of biologicaldata is a set of protein mass spectra.
 16. The computer readable programstorage device of claim 10, wherein the results of the first classifierand the second classifier are combined in parallel.
 17. The computerreadable program storage device of claim 13, wherein said Bayesiannetwork comprises computing mutual information of pairs of data of saiddata set, creating a draft network based on the mutual information,wherein data item of said data set comprise nodes of said network andthe edges connecting a pair of data nodes represent the mutualinformation of said nodes, thickening said network by adding edges whenpairs of data nodes cannot be d-separated, and thinning said network byanalyzing each edge of said draft network with a conditional independenttest and removing said edge if said corresponding data nodes can bed-separated.
 18. The computer readable program storage device of claim10, wherein said combining step comprises weighing the results of thefirst and second classifiers based on the input patterns.
 19. A methodfor analyzing biological data comprising the steps of: classifying afirst set of biological data in a first classifier, wherein said firstclassifier is a support vector representation and discriminationmachine, wherein said machine discriminates said data into a pluralityof classes using a plurality of discrimination functions, wherein aninner product of each said discrimination function with a kernelfunction is evaluated on said data, wherein the norm of each saiddiscrimination is minimized, and wherein the value of each said innerproduct is compared to a threshold to determine whether a biologicaldata item belongs to a class associated with said discriminationfunction; classifying a second set of biological data in a secondclassifier, wherein said second classifier is a Bayesian network; andanalyzing the combined results of said first classifier and said secondclassifier as a function of the similarity measure of the firstclassifier and the similarity measure of the second classifier.